The Similarity Index
نویسنده
چکیده
Is it possible to calculate a hash value for a document that captures its salient characteristics, such that a repository can be queried for like values and retrieve all “similar” documents? If so, similar documents could be easily identified by a simple SQL query without the need for a full text search engine. Such a value would allow systems to quickly identify duplicate or similar content before it is checked into a repository, introduced to an index, or returned in a query result. Additionally, this value could assist with identifying other content a user might be interested in, though they did not explicitly query for it. This paper endeavors to answer this question by exploring the corpus of existing research in this and related areas, and reporting the results of experimentation. This investigation was conducted with the intent of implementing such a solution in a Documentum environment.
منابع مشابه
New distance and similarity measures for hesitant fuzzy soft sets
The hesitant fuzzy soft set (HFSS), as a combination of hesitant fuzzy and soft sets, is regarded as a useful tool for dealing with the uncertainty and ambiguity of real-world problems. In HFSSs, each element is defined in terms of several parameters with arbitrary membership degrees. In addition, distance and similarity measures are considered as the important tools in different areas such as ...
متن کاملA Novel Image Structural Similarity Index Considering Image Content Detectability Using Maximally Stable Extremal Region Descriptor
The image content detectability and image structure preservation are closely related concepts with undeniable role in image quality assessment. However, the most attention of image quality studies has been paid to image structure evaluation, few of them focused on image content detectability. Examining the image structure was firstly introduced and assessed in Structural SIMilarity (SSIM) measu...
متن کاملDetermining specific species and the species contribution in the similarity between soil seed bank and standing vegetation Case study: Lazour rangeland- Firouzkooh
Determining the potential of soil seed bank and its specific species is important for conservation goals and vegetation restoration of rangelands. In this study, the characteristics of soil seed bank and standing vegetation in Lazour mountain rangeland were investigated in order to estimate the rehabilitation ability of the study area in case of possible disturbances. In order to determine the ...
متن کاملمقایسه شاخصهای خشکسالی هواشناسی در استان یزد
In this research, 5 percent of normal Precipitation Index (PNPI),Deciles of Precipitation(DPI),Rainfall Anomaly Index (RAI), Bahlme & Mooley Drought Index (BMDI) and standardized Precipitation Index (SPI) were used in order to investigate drought in Yazd synoptic station and 31 non synoptic stations all around this province. For this purpose, the present statistical errors were reconstructed vi...
متن کاملرتبهبندی و مقایسه شهرستانهای استان لرستان در بخش بهداشت و خدمات بهداشتی با استفاده از روش TOPSIS
Background: In spite of the great importance of health and health services, the imbalance in distribution of such services has always been one of the main problems of planners. This research was carried out with the aim of ranking and comparing health and health services in cities in Lorestan province. Materials and Methods: Data was collected from books and documents, and from experts in the ...
متن کاملFingerprinting and genetic diversity evaluation of rice cultivars using Inter Simple Sequence Repeat marker
Rice as one of the most important agricultural crops has a putative potential for ensuring food security and addressing poverty in the world. In the present study, in order to provide basic information to improve rice through breeding programs, Inter Simple Sequence Repeat marker (ISSR) was used For DNA fingerprinting and finding genetic relationships among 32 different cultivars. In this study...
متن کامل